智能论文笔记

Graph Federated Learning with Hidden Representation Sharing

Shuang Wu , Mingxuan Zhang , Yuantong Li , Carl Yang , Pan Li

分类：机器学习

2022-12-23

Learning on Graphs (LoG) is widely used in multi-client systems when each client has insufficient local data, and multiple clients have to share their raw data to learn a model of good quality. One scenario is to recommend items to clients with limited historical data and sharing similar preferences with other clients in a social network. On the other hand, due to the increasing demands for the protection of clients' data privacy, Federated Learning (FL) has been widely adopted: FL requires models to be trained in a multi-client system and restricts sharing of raw data among clients. The underlying potential data-sharing conflict between LoG and FL is under-explored and how to benefit from both sides is a promising problem. In this work, we first formulate the Graph Federated Learning (GFL) problem that unifies LoG and FL in multi-client systems and then propose sharing hidden representation instead of the raw data of neighbors to protect data privacy as a solution. To overcome the biased gradient problem in GFL, we provide a gradient estimation method and its convergence analysis under the non-convex objective. In experiments, we evaluate our method in classification tasks on graphs. Our experiment shows a good match between our theory and the practice.

translated by 谷歌翻译

Wukong-Reader: Multi-modal Pre-training for Fine-grained Visual Document Understanding

Haoli Bai , Zhiguang Liu , Xiaojun Meng , Wentao Li , Shuang Liu , Nian Xie , Rongfu Zheng , Liangwei Wang , Lu Hou , Jiansheng Wei

分类：自然语言处理 | 计算机视觉

2022-12-19

Unsupervised pre-training on millions of digital-born or scanned documents has shown promising advances in visual document understanding~(VDU). While various vision-language pre-training objectives are studied in existing solutions, the document textline, as an intrinsic granularity in VDU, has seldom been explored so far. A document textline usually contains words that are spatially and semantically correlated, which can be easily obtained from OCR engines. In this paper, we propose Wukong-Reader, trained with new pre-training objectives to leverage the structural knowledge nested in document textlines. We introduce textline-region contrastive learning to achieve fine-grained alignment between the visual regions and texts of document textlines. Furthermore, masked region modeling and textline-grid matching are also designed to enhance the visual and layout representations of textlines. Experiments show that our Wukong-Reader has superior performance on various VDU tasks such as information extraction. The fine-grained alignment over textlines also empowers Wukong-Reader with promising localization ability.

translated by 谷歌翻译

Artificial Intelligence Security Competition (AISC)

Yinpeng Dong , Peng Chen , Senyou Deng , Lianji L , Yi Sun , Hanyu Zhao , Jiaxing Li , Yunteng Tan , Xinyu Liu , Yangyi Dong

分类：人工智能 | 计算机视觉 | 机器学习

2022-12-07

The security of artificial intelligence (AI) is an important research area towards safe, reliable, and trustworthy AI systems. To accelerate the research on AI security, the Artificial Intelligence Security Competition (AISC) was organized by the Zhongguancun Laboratory, China Industrial Control Systems Cyber Emergency Response Team, Institute for Artificial Intelligence, Tsinghua University, and RealAI as part of the Zhongguancun International Frontier Technology Innovation Competition (https://www.zgc-aisc.com/en). The competition consists of three tracks, including Deepfake Security Competition, Autonomous Driving Security Competition, and Face Recognition Security Competition. This report will introduce the competition rules of these three tracks and the solutions of top-ranking teams in each track.

translated by 谷歌翻译

The Outcome of the 2022 Landslide4Sense Competition: Advanced Landslide Detection from Multi-Source Satellite Imagery

Omid Ghorbanzadeh , Yonghao Xu , Hengwei Zhao , Junjue Wang , Yanfei Zhong , Dong Zhao , Qi Zang , Shuang Wang , Fahong Zhang , Yilei Shi

分类：计算机视觉

2022-09-06

这里介绍了人工智能研究所（IARAI）组织的2022年Landslide4sense（L4S）竞赛的科学结果。竞争的目的是根据全球收集的卫星图像的大规模多个来源自动检测滑坡。 2022 L4S旨在促进有关使用卫星图像的语义分割任务的深度学习模型（DL）模型最新发展的跨学科研究。在过去的几年中，由于卷积神经网络（CNN）的发展，基于DL的模型已经达到了对图像解释的期望。本文的主要目的是介绍本次比赛中介绍的细节和表现最佳的算法。获胜的解决方案详细介绍了Swin Transformer，Segformer和U-NET等最先进的模型。还考虑了先进的机器学习技术和诸如硬采矿，自我培训和混合数据增强之类的策略。此外，我们描述了L4S基准数据集，以促进进一步的比较，并在线报告准确性评估的结果。可以在\ textIt {未来开发排行榜上访问数据，以供将来评估，\ url {https://www.iarai.ac.ac.at/landslide4sense/challenge/}，并邀请研究人员提交更多预测结果，评估准确性在他们的方法中，将它们与其他用户的方法进行比较，理想情况下，改善了本文报告的滑坡检测结果。

translated by 谷歌翻译

Federated Learning with Label Distribution Skew via Logits Calibration

Jie Zhang , Zhiqi Li , Bo Li , Jianghe Xu , Shuang Wu , Shouhong Ding , Chao Wu

分类：机器学习 | 人工智能

2022-09-01

传统的联邦优化方法的性能较差（即降低准确性），尤其是对于高度偏斜的数据。在本文中，我们调查了佛罗里达州的标签分布偏斜，在那里标签的分布各不相同。首先，我们从统计视图研究了标签分布偏斜。我们在理论上和经验上都证明了基于软马克斯跨凝结的先前方法不合适，这可能会导致本地模型非常适合少数群体和缺失的类别。此外，我们从理论上引入了一个偏离，以测量本地更新后梯度的偏差。最后，我们建议通过\ textbf {l} ogits \ textbf {c}启动）FedLc（\ textbf {fed {fed}学习，该学习根据每个类别的出现可能性。 FedLC通过添加成对标签的边距将细粒度校准的跨透镜损失应用于本地更新。联合数据集和现实世界数据集的广泛实验表明，联邦快递会导致更准确的全球模型和大大改善的性能。此外，将其他FL方法集成到我们的方法中可以进一步增强全球模型的性能。

translated by 谷歌翻译

Learning Modal-Invariant and Temporal-Memory for Video-based Visible-Infrared Person Re-Identification

Xinyu Lin , Jinxing Li , Zeyu Ma , Huafeng Li , Shuang Li , Kaixiong Xu , Guangming Lu , David Zhang

分类：计算机视觉

2022-08-04

感谢您的跨模式检索技术，通过将它们投射到一个共同的空间中，可以在24小时的监视系统中重新进行重新识别，从而实现了可见的信号（RGB-IR）重新识别（RE-ID）。但是，关于探测到探测器，几乎所有现有的基于RGB-IR的跨模式人RE-ID方法都集中在图像到图像匹配上，而视频对视频匹配包含更丰富的空间 - 和时间信息仍未探索。在本文中，我们主要研究基于视频的跨模式人Re-ID方法。为了实现这项任务，构建了一个基于视频的RGB-IR数据集，其中927个有效身份，具有463,259帧和21,863个曲目，由12个RGB/IR摄像机捕获。基于我们构造的数据集，我们证明，随着曲目中帧的增加，该性能确实达到了更多的增强功能，证明了视频对视频匹配在RGB-IR RE-ID中的重要性。此外，进一步提出了一种新颖的方法，不仅将两种模态投射到模态不变子空间，而且还提取了运动不变的时间记忆。多亏了这两种策略，我们基于视频的跨模式人重新ID取得了更好的结果。代码和数据集以：https：//github.com/vcmproject233/mitml发布。

translated by 谷歌翻译

Making the Best of Both Worlds: A Domain-Oriented Transformer for Unsupervised Domain Adaptation

Wenxuan Ma , Jinming Zhang , Shuang Li , Chi Harold Liu , Yulin Wang , Wei Li

分类：计算机视觉

2022-08-02

关于无监督的域适应性（UDA）的广泛研究已将有限的实验数据集深入学习到现实世界中无约束的领域。大多数UDA接近通用嵌入空间中的对齐功能，并将共享分类器应用于目标预测。但是，由于当域差异很大时可能不存在完全排列的特征空间，因此这些方法受到了两个局限性。首先，由于缺乏目标标签监督，强制域的比对会恶化目标域的可区分性。其次，源监督分类器不可避免地偏向源数据，因此它在目标域中的表现可能不佳。为了减轻这些问题，我们建议在两个集中在不同领域的空间中同时进行特征对齐，并为每个空间创建一个针对该域的面向域的分类器。具体而言，我们设计了一个面向域的变压器（DOT），该变压器（DOT）具有两个单独的分类令牌，以学习不同的面向域的表示形式和两个分类器，以保持域的可区分性。理论保证的基于对比度的对齐和源指导的伪标签细化策略被用来探索域名和特定信息。全面的实验验证了我们的方法在几个基准上实现了最先进的方法。

translated by 谷歌翻译

SP2: A Second Order Stochastic Polyak Method

Shuang Li , William J. Swartworth , Martin Takáč , Deanna Needell , Robert M. Gower

分类：机器学习

2022-07-17

最近，“ SP”（随机Polyak步长）方法已成为一种竞争自适应方法，用于设置SGD的步骤尺寸。SP可以解释为专门针对插值模型的方法，因为它求解了插值方程。SP通过使用模型的局部线性化来求解这些方程。我们进一步迈出一步，并开发一种解决模型局部二阶近似的插值方程的方法。我们最终的方法SP2使用Hessian-Vector产品来加快SP的收敛性。此外，在二阶方法中，SP2的设计绝不依赖于正定的Hessian矩阵或目标函数的凸度。我们显示SP2在矩阵完成，非凸测试问题和逻辑回归方面非常有竞争力。我们还提供了关于Quadratics总和的融合理论。

translated by 谷歌翻译

Learning Iterative Reasoning through Energy Minimization

Yilun Du , Shuang Li , Joshua B. Tenenbaum , Igor Mordatch

分类：机器学习 | 人工智能

2022-06-30

深度学习在复杂的模式识别任务上表现出色，例如图像分类和对象识别。但是，它与需要非平凡推理的任务（例如算法计算）斗争。人类能够通过迭代推理来解决此类任务 - 花更多的时间思考更艰难的任务。但是，大多数现有的神经网络都表现出由神经网络体系结构控制的固定计算预算，从而阻止了对更艰难任务的其他计算处理。在这项工作中，我们为神经网络提供了一个新的迭代推理框架。我们训练神经网络以在所有输出上参数化能量景观，并实施迭代推理的每个步骤，作为能量最小化步骤，以找到最小的能量解决方案。通过将推理作为一个能量最小化问题，对于导致更复杂的能源景观的更严重的问题，我们可以通过运行更复杂的优化程序来调整我们的基本计算预算。我们从经验上说明，我们的迭代推理方法可以在图和连续域中解决更准确和可推广的算法推理任务。最后，我们说明我们的方法可以递归解决需要嵌套推理的算法问题

translated by 谷歌翻译

Improving Policy Optimization with Generalist-Specialist Learning

Zhiwei Jia , Xuanlin Li , Zhan Ling , Shuang Liu , Yiran Wu , Hao Su

分类：机器学习

2022-06-26

对看不见的环境变化的深入强化学习的概括通常需要对大量各种培训变化进行政策学习。我们从经验上观察到，接受过许多变化的代理商（通才）倾向于在一开始就更快地学习，但是长期以来其最佳水平的性能高原。相比之下，只接受一些变体培训的代理商（专家）通常可以在有限的计算预算下获得高回报。为了两全其美，我们提出了一个新颖的通才特权训练框架。具体来说，我们首先培训一名通才的所有环境变化。当它无法改善时，我们会推出大量的专家，并从通才克隆过重量，每个人都接受了训练，以掌握选定的一小部分变化子集。我们终于通过所有专家的示范引起的辅助奖励恢复了通才的培训。特别是，我们调查了开始专业培训的时机，并在专家的帮助下比较策略以学习通才。我们表明，该框架将政策学习的信封推向了包括Procgen，Meta-World和Maniskill在内的几个具有挑战性和流行的基准。

translated by 谷歌翻译